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ABSTRACT 

Procedures for evaluating the accuracy of Landsat derived wildland cover 
classifications are described and associated problems discussed. The evaluation 
procedures include: 1) implementing a stratified random sample for obtaining 
unbiased verification data; 2) performing area by area comparisons between 
verification and Landsat data for botli heterogeneous and homogeneous fields: 
3) providing overall and individual classification accuracies with confidence 
limits; 4) displaying results within contingency tables for analysis of confusion 
betsveen classes; and 5) quantifying the amount of information (bits/square 
kilometer) conveyed in the Landsat classification. 

Overall low classification accuracies for a test site in northwestern Colo- 
rado were determined for the entire sampled population at 37.3 percent (range 
35.8 to 38.7 percent) and for the homogeneous areas at 61.3 percent (range 
57.1 to 6h.2 percent). A further evaluation was undertaken to evaluate pos- 
sible errors not associated with Landsat classifications. Significant biases in 
classification accuracy were attributed to defining class characteristics for 
verification pixels which were not represented within the Landsat classifica- 
tions, Analysis of sampled verification designations showed that 90 percent of 
the pixel? which were misdesignated for verification were misclassified for 
Landsat data. Other problems were found with misregistration between veri- 
fication and Landsat fields, photointerpretation errors for verification field 
designations, and separate class definitions used for the Landsat classifications 
and verification fields. An underlying factor contributing to the errors is 
attributed to ground cover class heterogeneity. 
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INTRODUCTION 


When assessing Lanclsat land cover classification, there are often many problems which result 
in inaccurate reporting of classification accuracies, Unfortunately, there ire not just one or two 
problems but usually several which may in effect either decrease or increasvi the calculated accuracy. 

Special attention must be given to six primary areas, when addressing classification accuracy of 
Landsat maps, 

1. Tlie sample from v/lvich verification data are selected must have sufficient unbiased observa- 
tions for providing specified confidence limits for the classification accuracies. 

2. The verification data must approach 100 percent accuracy, 

3. Tlie class definitions used for the verification designations and Landsat classifications must 
be similar, 

4. The verification data and Landsat classifications must represent the same location prior to 
area by area comparisons. 

5. Tlie results must be reported which provide omission (Type 1) and commission (Type 2) 
errors, along with sources of confusion between classes. 

6. The evaluation of the performance of the Landsat classification requires an evaluation of 
homogeneous class pixels and not mixed class pixels which cannot be processed by most classifiers. 
Heterogeneous class pixels should also be analyzed for assessing the actual accuracy of classification 
maps. 

The purpose of this paper is to describe some of the most likely problems associated with evaluating 
Landsat land cover classifications. In addition, procedures are described for use in an accuracy 
evaluation of Landsat land cover classifications. 


LITERATURE REVIEW 
Methods Used in Collecting Verification Data 

Many workers have referenced the necessity of using a proper sampling design for evaluating the 
classification of processed nuiltispectral data (Kelly, 1970; Berry and Baker, 1968; Hajic and 
Simonett, 1976; and Genderen and Lock, 1976), The sample design yields an unbiased selection of 
evaluation fields and adequately samples all classes. 

Randomly selected coordinates are often used for locating unbiased evaluation fields. Strati- 
fication procedures may be used to subdivide large areas into units (strata), having similar features 
(e.g., soils, geology, topography, and climate) for more informative and useful evaluation. Stratifi- 
cation procedures also enable one to increase the sample size for strata which arc heterogeneous in 
class composition, thus encouraging a better representation of rare classes. Zonneveld (1974), and 
Rudd (1971), achieved an adequate sample size for all categories by stratifying the study area by 
class cover and randomly sampling within groups of classes until tlve rare classes were adequately 
represented. 




Given preilctomiineHl confidence limits and expected percent uciuiracics, l iord and Brooncr 
( I t)7(i) and Genderen and Lock 1 1978) list tables to estimate the number ofsantples required. Gine- 
vi:n (1979) provides procedures for providing confideime limits wlien the verification data are less 
than lOOS' accurai^e. A mathematical basis for selecting the number of sample points is fully de- 
scribed in those papers. Zonnevcld (1974) selected the nuiriber of sample pomt.s based on the 
amount of time and money available. Hay (1979) reports a minimum sample size of 50 is sufficient 
for most applications. 

Krebs (1976) evaluated different methods of obtaining verification data. She eonctuded that 
it is more efficient to use piiotointerpretatiQH of aerial photography, when available, than actual 
field work, Tliis approach reduces the time involved in collecting data and allows for the sampling 
of inaccessible areas. Genderen and Lock (1976) report field checks are necessary for areas which 
arc difficult to photointerpret correctly, 

Smedes (1975) reports on many of the problems associated with obtaining 100 percent accu- 
rate verification data, One problem in particular is the ground cover heterogeneity problem which 
causes: 1) compoundeJ problems when there is spatial misregistration between verification and 
Landsat data; 2) frequent misphotointerpretation of verification fields; and 3) diffieulty providing 
adequate class definitions patterning the Landsat classifications. 


Methods of Analvsis 


In almost all quantitative studies, the processed data is compared with the verification data to 
obtain the pcrceiitage of correct or incorrect occurrences (Rudd, 1 97 1 *, Biehj and Silva, 1975). Dk' 
percentage agreement is supplied for each class and the total sampled population. Hord and Brooncr 
( 1976) uive a fonnula for obtaining confidence limits for the accuracies. 


A class confusion table was used by Genderen and Lock (I97(i) and Tom (1977) to obtaii\ the 
frequency witli which one class may be attributed to another, along with two types of error. Type 1 



sion and commission errors, respectively. A two-way decision table (Table 1) depicts the four pos- 
sible outcomes for the results in a class confusion table. 


Hord and Brooner (1976) recommend giving classification accuracies for various levels of classi- 
fication. For e.xample, a third level of classification separates aspen, cottonwood, ponderosa pine, 
and lodgepole pine. At level two, aspen is combined with cottonwood to form deciduous forest, 
and ponderosa pine with lodgepole pine to form coniferous t'orest, Tlte lowest level of cla.ssificatton 
(level one) then combines deciduous with coniferous For a general forest class. At each level of 
classification, a classification accuracy should be established. This* approach allows the evaluator to 
analyze the Landsat classincations for different groupings of ground cover eominunities. Anderson, 
ct al. ( 1 976) provide a hierarchical classificatioiv system based on remote sensing capabilities. 






LANDSAT DHIUVBD CLASSIFICATION MAPS 


The example of a Landsat derived classifieation used for this evaluation was a wildlaiul classifi- 
cation of a 7,500 square kilometer area near Piceanoe Creek Basin in northwestern Colorado that 
was prepared under contract to the Fish and Wildlife Ser>'iee (FWS) by Bendix Corporation (see 
Bendix Aerospace Systems Division, 1978). The classification sciieme used (Table 2) was developed 
taking into consideration inputs of FWS wildlife biologists on wildlife habitat requirentents. Train- 
ing fields were selected from 1 :30,000 color infrared photography and selectively ground checked. 
Several spectral signatures were developed for each land cover class to take into account the spectral 
variability introduced into each class by variations in topography, climate, soils, etc, A. standard 
maximum likhhood supervised classification was used. 


E VA LUATION P I’sOCED U RES 


Statistical Sampling Scheme 


The schemes used to obtain random and unbiased verification samples were designed for large 
areas of 2,500 square kilometers or more. The procedures provide for selection of a stratified .sam- 
ple to cover large areas which dift\:r in spatial and spectral characteristics. This stratification encour- 
ages a proper sample si/e from areas differing in si.tc and complexity, 


Once the total area was stratified based on geology, climate, topography, and ground cover 
information, there was a systematic selection of 7'F quadrangles within each strata. Quadrangle 
sixe areas were used because quadrangle inaps (U.S.C.S. 7' 2 * topographic quadrangles) were readily 
available and were of a size convenient for the first level of sampling. The number of 7'i’ quad- 
rangles selected wititin eacii strata was dependent on the class heterogeneity of the strata We selec- 
ted more 7':’ quadrangles from heterogeneous strata in order to increase the sample si/e for rare 
classes. 


From each 7' i’ quadrangle we evaluated a set of randomly located pixels. Since there would 
likely be spatial misregistration problems wiien working with isolated pixels, we decided to group a 
set of 9 pixels into a cell size of iO acres for use in all evaluation procedures. Approximately 50 
10-aere cells were selected for tach 7!'i’ quadrangle. We originally selected 25 quadrangles to cn.sure 
a good representation of cover for evaluation. This was thought to be within the budget constraints 
of the project. However, the original number was optimistic. We, therefore, had to reduce the num- 
ber of 7' quadrangles for evaluation from 25 to 13, The final distribution of the 71:’ quadrangles 
is plotted in Figure 1 . 


Verification Data Collection 


Pliotointerpretation and field visits were used to generate verification data. The 50 10-acre 
cells (see Figure 2 for example) were photointerpreted by personnel from Ecology Consultants, Inc. 
who liad experience with western land cover, A Zoom Transfer Scope was used to plot the randontly 
selected 1 0-acre cells (as overlayed on 7' :’ quadrangles) onto color infrared photographs. 







Cemlcren and Lock (1978) note that a form supplied to the personnel obtaining verification 
data will improve the efficiency and consistency of work, All of the information completed by HCI 
was placed on verification fonns (see Figure 3). 'Tlie information contains ground cover class iden- 
tity for each pixel, ground cover class boundaries, percent pixel coverage, and overall cell relief, all 
of whicli are defined on the form. 


The photointerpreters' decisions for the class designations of the pixels for the verification 
data were to be patterned after the training fields (as they appeared on color infrared photographs) 
used in die classification of Landsat. The purpose was to maximize the correspondence between 
the spectral and spatial characteristics used in the verification data class designations. This is neces- 
sary to evaluate the accuracy of the clashfication of Landsat data, Unfortunately, this was not 
completely achieved and is described in 'the section on Lack of Correspondence Between Signature 
and Verification Data Sets. 


Evaluation Algorithms 

We completed a pixel by pixel comparison between tlie verification data and Landsat classifi- 
cations for four levels of classification, The results of the comparisons arc displayed in cla.ss confu- 
sion tables, An important aspect of the tables are tlie classification accuracies. 

Confidence limits were assigned to the accuracies by evaluating the approximation for m 

Pr (-b < b) = 1 -a ( 1 ) 

a y N 

where 


100 (1“«) is the oonfidence level of the limit, 

u is the probability that any pixel of a given class is correctly classified, 

X is the estimate of n or the class accuracies, 

0“ is the variance of the binomial distribution of Xj, 
b is obtained from the normal distribution tables, 

O’ is the probability that any pixel of a given class may occur beyond the range of the confidence 
limits. 


A more detailed description for assigning Gonfidence limits is described by Hord and Brooner 
( 1 976), The logic for the proof of the approximation is given by Brunk ( 1 965). 

Information conveyed from Landsat in bits/squarc kilometer was computed for all levels of 
classification. This is accomplished by computing joint probabilities from the cla,ss confusion table 
obtained during evaluation. Tliat is, 


P(x.y) = P(x|y) P(y) = P(y|x) P(x) 


( 2 ) 
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where x is derinccl as the verification data and y is the output froin Lamisat, ThCvSe values are used 
to compute joint uncertainty: 

Htx.y) = H(x|y) -h H(y) = lltylx) + H(x) (3) 

From these values we obtain contingent uncertainty, which was ecpiated to inrorniation trans- 
mitted as: 


H-P = H(x:y) » (x,y) -- H (x,y) 


(4) 


where (x,y) is tlie maximum uncertainty which exists when there is no correlation between 
X and y. Tim information transmitted is valid only if the verification data is 100''' accurate, if not, 
the computed values should be used to compare performance between class levels and/or areas. For 
a more deta';md explanation of procedures, sec Maxwell ( 1 975) and Clarner { 1 962). 

Tlierc are severa,' problems associated will) classification decisions when there is more than one 
ground cover class occurring within a pixel, Present classification algoritlims, including the one used 
in this study, are not designed to classify mixed class pixels. Therefore, to fully test the capabilities 
of the classification of Landsat data, it is important to separately evaluate those cells which are 
homogeneous. In this study we separated the homogeneous areas within tlie santpled area and per- 
formed an additional evaluation only in those areas. 


Another procedure implemented for reducing mixed class pixel problems was to aggregate nine- 
pixel cells for the verification data and Landsat classifications and select a single class designation 
based upon a majority of the classes present. Results from these comparisons also reduce sources 
of error attributed to minor differences in spatial registration between verification and Laiulsat 
classifications. 


EVALUATION RESULTS 


Registration Analysis 


A necessary prerequisite fora pixel by pixel comparison between verification data and Land, sat 
classifications are that they represent the same location (i.e., the Landsat pixel and the verification 
data are all precisely registered). As a cheek for possible misregistration of the Landsat classifica- 
tions, we shifted the 3x3 pixel verification cells, simultaneously witliiii each quad, by one pixel in 
all directions. The 3 3 verification cell has nine possible positions for comparison within an ex- 

tracted 5x5 Landsat classified area. 


For each IVi' quadrangle, vve computed the overall classification accuracy between the verifi- 
cation and Landsat classifications of the nine possible positions. The overall clnssification accuracy 
for each location was weighted by the class acreages for the classification results, and the position 
with the highest value was assumed the best registered, This position was selected for all subitequent 
analyses. The weightings were added to ensure classes containing majority acreage cover have a 
stronger affect over the selection of the po.sition with the best registration, 






For ajiotiwr study area (sec Bciidtx Aerospace Systems Division, 1978) there was a consistent 
tendency for best registratioji by shifting verification cells one pixel down and one pixel to the left. 
Since this position oohsistently gave best results, it was .iavumed the Landsat classifications were 
properly registered in this position and no further analysis was undertaken. 

For this evaluation, there was no consistent best position based on elassirication v/ith cell shift 
analysts. For this reason, the following additional effort was umlertaken, 


To check the spatial registration of the Landsat classifications, features on the 7’ ’’ quadrangle 
maps were mamially positioned to register with appropriate classes representing the same features. 
Useful features included reservoirs, tree lines, and cliffs, with reservoirs being the most reliable. 


We determined the Landsat classincation maps to be spatially misrcglsterod frotn 7.ero to four 
pixels. Tlic provision for sliifting the verification data by one pixel in all directions could not cor- 
rect for tlicse errors. Wc therefore had to manually eoirect the misregistration. 


Wc changed the x.y coordinates of the Landsat pixel areas to correct for the errors observed. 
Tliis svas accomplished for each quodrangle on an individual basis since the misregistrations were 
not tlic same for each quadranglo. However, the tnisregistration was consistem within a quadrangle. 


To lest the procedures fora'“recrion, wc calculated the overall classification before and after 
registration correction. In all insrances. except for tlic Jessup Gulch quadrangle, the overall percent- 
ages inere-ased for the positionally corrected maps (see Table 7) and we proceeded with the analyses 
(all results given in previous tables are from positionally corrected maps). 


The accurate location of verification cell boundaries on color infrared photographs referenced 
to 71 j’ uses topographic quadrangles was necessary for the spatial registration of the verification 
data. Tlie analysis of the accuracy of this registration was accomplished by comparing two people’s 
work for identical I’clls. 


The results for 1 2 cells sliowed a range in error from 0.01 cm to 0.10 cm with an average error 
of 0.05 cm. The pixel dimensions at 1 :30,000 scale are appro.vimately 0.18 cm in width and 0.25 
cm in lengtli. Thus the average error was about 'a pixel which was deemetl acceptable. Tliis is' by no 
means conclusive since a sample of 1 2 cells out of approximately 450 is not adequate, but spot 
checks later indicated reasonable accuracy was being maintained. 


Typical Results 


Evaluation of Landsat classifications for the four levels of classification, yielded overall low 
results. Results for the total sampled area at tlie 12 class level, for an example, tire summarized in 
Table 3 . inspection of Table 3 yields Infomiation on confusion between classes, omission and com- 
mission errors, overall classification accuracy, confidence limits, and information quantity assess- 
ment. Tlic data are a compilation of results from all quads. Other analyses were completed with 
Individual quads and for quads grouped into strata (see Toll, 1978), 
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The data in Table 4 result from assigning lO-aere cells a single class designation, prior to cotn- 
parisons between veridcation data and Landsat classlfleaiions tvi reduce adverse efrects from juixcd 
class pixels and possible ; ;*atial misregistration. A comparison between Table 3 and 4 at the 13 class 
level show an overall increase in accuracy from 37.6 percent to 44.3 percent, 


We stated earlier, since we were interested in evaluating tlio periormance of the Landsat classi- 
fication, we felt a further evaluation sliouki occur only for homogeneous cells. We therefore dis- 
carded all cells which were heterogeneous in land cover and evaluated only the homogeneous cells. 
Results (see Tables 5 and 6) at the 13 class level show an improved overall classification accuracy to 
61 ,3 percent for the pixel by pixel comparisons and 73.9 pcrcciU: when designating the 10-acre cells 
single classes prior to analysis. Clearly, the increases may be attributed to removing mixed class 
pixels from evaluation, which the elassifier was not designed to categorize, and converting 10-acre 
cells (l.e.t 9 class designations) into single class representations, thereby reducing the need for an 
accurate spatial registration. 

Even though the additional analyses sliowcd higher classification accuracies, we thought an 
evaluation was necessary to further examine errors which are not attributed to the Landsat data 
and/or the classifier used for the Landsat classifications. Results from these analyses are provided 
in subsequent sections. 


Lack of Correspondence Between Signature and Verification Data Sets 


Because of known deficicnccs in the training data and in photointorpreting the verification 
cells, the possibility of poor correspondence botweeii the training fields and the verification fields 
was considered, In other words, it became evident th.at tlie ground cover conditions for the training 
fields classified low density sagebrush n.right not always be u\c same ground cover conditions which 
the photointerpreter identified as low deiisitj sagebrusli. If this supposition were true, then there 
would be no basis for seeking agreement between the two sets of data (Landsat and verification). 
The following effort was undertaken to analyze the correspondonee between the ground cover char- 
acteristics for the verification pixels seen on the color infrared photoguiplis and the ground cover 
cliaracteristics for the training fields as seen on the same photographs. 


A sample of pixels from all the verification pixels was examined in detail for its correspondence 
to the training field descriptions. The sample was obtained from the quadrangles for whici', we had 
color infrared photography coverage (eight quadrangles overall). From each quadrangle we selected 
every other nine pixel cell and from those cells we analyzed three randomly selected pixels. 

We examined the appearance of tlie sampled verification pixels in tlie color infrared piiotog- 
rapliy for their ground cover type(s), color texture, and vigor as was done for the training fields. 
Once rive spectral and spatial charaeteristjes of tlie verification pixels were obtained, they were com- 
pared with the spectral and spatial eharacteristics of the training fields, to determine the frequency 
with whiclv they had good corrcspondenec. Tlie criterion for good correspondence was that if the 
pixel had more than 75^ spectral and spatial ground cover characteristics which were also included 
in the training field descriptions, then tive pixel was said to be in good correspondence with the 
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miiniiig ndds. Of the ground cover cluiracteristics, typo and color characteristics were examined 
the closest. Although the texture and vigor information were useful in some borderline cases and in 
instances where there may have been reproductive differences in processing between photographs, 
wc thought the ground cover type uiul color informa tion were more important. We defined that if 
the pixel contained more titan 25^? of spectral and spatial characteristics not in the training data, 
then the pixel was said to be heterogeneous. 

Once the criterion for correspondence was established wc could examine the significance of 
correspondence (i.e., class definitions) to agreement (i,e„ classification accuracy) between Landsat 
and verification data, if the verification pixels which were in correspondence with the training fields 
showed the same bias toward disagreement witli the classification of Landsat as the verification 
pixels wltieh were not in correspondence with the training fields, then wc would conclude the poor 
correspondence for the verification pixels did not affect or contribute to the confusion between the 
verification and Landsat data. To evaluate tlie above statement, sve set up the null hypothesis that 
tltc verification pixels which were not in correspoiulence to the training fields, did not show bias 
toward disagreement with tlie Landsat classifications. If the null hypothesis may be rejected at a 
low confidence level, then wc may conclude most of the confusion between verification and Lamlsal 
data may be attributed to the ground cover charaeteristles of the verification pixels not correspond- 
ing to the ground cover clvaracteristies of the training fields. 


Tltc results from the above analysis may be placed in a two-way contingency tabic (sec Table 
8). Along the rows arc the verification data wlijcli either were or were i\ot in agreement with tlve 
Landsat classifications and along the columns arc the verification pixels which were or sverc not in 
good correspondence witlv the training fields. 

For each of the classes, at two classification levels, we used the fonnat in Table 8 to evaluate 
the null hypothesis, Fisher’s exaet probability test (from Till, 1974) w'as used to test the null hypo- 
thesis, by solving for P: 


(a+b) ! (c+d i 


! (a+c) ! fb+d) ! 
b ! c ! d ! 


Where P. is the probability of partitioning the four possible frequencies (a, b, e, and d in the tsvo- 
way contingency table) arising by clianee. The value of P is the le\.' 2 l at wlvioii the null hypotliesis is 
rejected. 

Information given in Tabic 9 provides confidence levels for two classification levels at svliioli we 
may reject tlie null liypotliesis. Overall, we conclude that poor correspondence between verification 
pixels and training fields \vas a factor for the low Landsat classification accuracies. There are three 
deficiencies contributing to the poor correspondence: 1) the training areas were not diverse enough 
to take into account most of the ground cover variation; 2) there were errors in photointerpretation 
where a verification cell was designated the wrong class: and 3) for many classes tliere svere different 
class definitions used when designating tlie verification fields than svere defined by interpreting the 
training fields. The underlying factor contributing to these errors were from the heterogeneous 
ground cover in tlie study area. 




For the fir class, as an example, Table 10 shows that 5 out of the 21 pixels analyzed were in 
poor correspondence with the training field descriptions at the 24 class level. Hie five pixels with 
poor correspoi/'erep, usually contained a shrub type in addition to fir. 

An example of poor correspondence is given in Figure 4, For pixels 3 and 5 it may be seen 
that there exists over a SO'r ground cover type other than fir, Within these pixels designated fir for 
the verification data there arc both mixed shrub and sagcbrush/grassland ground cover types, which 
would explain why pixel 3 was designated as a mixed shrub-sagebrush class and pixel 5 as a mixed 
shrub class for the Landsat classifications, Pixel I contains close to lOOC^ fir and was designated as 
fir for the Landsat classifications. The situation seen in Figure 4 clearly explains the cause of con- 
fusion between the fir and shrub-sagebrush classes and is representative of the situation observed for 
these classes. 

Wc tested the null hypothesis that the fir pixel designations for the verification pixels, which 
were not in correspondence with the training fields, did not show any bias toward disagreement 
witii Landsat data, Using Fisher’s test for the data in Table 10 the null hypothesis may be rejected 
at the 0.01 confidence level for the 24 class level and at the 0,02 confidence level for the 9 class 
level. Hence, much of the confusion may be attributed to the ground cover characteristics of the 
verification pixels not corresponding to the ground cover characteristics of the training fields. 

Another obvious example in whicii poor correspondence between the verification data and 
training fields caused misclassification is with the mixed shrub-sagebrusli class. The data in Table 1 1 
shows that 45 out of the 63 pixels analyzed were in poor correspondence with the training field data 
at the 24 class level. 

Again Fisher’s test was used to evaluate the null l’•.ypothesis for the data in Table 1 1 , For the 
24 class level and 1 2 class level the null hypothesis may be rejected at the 0,01 confidence level. 
Much of the poor correspondence was the result of mixed class pixel problems and differences of 
shrub types between the verification pixels and training fields. 

Figure 5 illustrates reasons for the poor correspondence between the verification pixels and 
Landsat classifications. Verification pixels 1 and 2 both contain approximately 50% or more of a 
barren-grassland combination not occurring in the training fields, along with approximateiy 20% 
pinyon-jimiper. Furthermore, the shrub type which does occur in the verification pixels does not 
resemble the shrub types in the mixed shrub -sage brush training fields as is represented in Figure 6. 


Photointerpretation Errors 


To obtain Ivigh classification accuracies tlie verification data must approach [00 percent accu- 
racy. For this study photointerpretation along with spot field checks were used to classify the veri- 
fication data. Since this was crucial to the evaluation of classification accuracy, we reevaluated the 
p ho to i n terp re ta ti o n work . 
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The i)rcvious seotioii on Lack oT Correspondence Between Signature and V^fification Data Sets 
shows the pltotointerpreters designating a pixel not in correspondence with class descriptions. TItis 
occurred for 63'.' of tlie sampled verification pixels. The photointerpreter should have designated 
tho.se pixels as uncategorixed. Additionally, there were many instances when the verification pixels 
should have been designated as another class. 

The high occurrence for the misdesignation of the verification pixels (i.e., 63C? ) affected the 
reported classificatiott accuracies. An examination of the misdesignated verification pixels showed 
90'.r of them were in disagreement \vith the Landsat classifications. This clearly demonstrates that 
much of the disagreement between the Landsat classifications and the verification pixels was a re.sult 
of photointerpreter error and not from misclassified Landsat data. However, the inability ol the 
photointerpreters to satisfactorily designate verification pixels classes patterning the training Helds, 
stems from the fact that the training fields used in the Landsat classifications did not adequately 
represent the study area ground cover. 

Figure 7 provides a typical example of the need for the photointerpreters to have designated 
verification pixels uncategorized or as another cla’j,i.. Verification pixel 2 is an example of a barren 
class designated as low density sagebrush. In the training fields for low density sagebrush there are 
no barren area components. Furthermore the lowest density sagebrush training Held contains 
around 70 percent sagebrush. Pi.xel 2 has only 10 percent sagebrush witli the remaining land cover 
closely patterning the tone aiul texture in the barren training fields. Verification pixel 5 has a coiU’- 
bination of barren, grassland, and sagebrusit land cover. This pixel does i\ot correspond with any of 
the training fields and should have been designated as uncategorized. 


Changing Class Definitions 

After obtaining overall poor results, we evaluated the possibility of having different class defi- 
nitions for the signature and verification data sets. One inadequacy in particular was with the liigh 
density pinyon-juniper class. There were almost three times as many pixels designated lugh density 
pinyon-juniper in tivc Landsat classifications as there were in the verification data. This was a result 
of one of the training fields used in the Landsat classifications (see Figure 8) containing ec]Ual pro- 
portions of pinyon-juniper, grassland, sagebrush, and other types of shrub, camsing areas of the.se 
mixtures to be designated as high density pinyon-juniper. The photointerpreters assumed there were 
no mixtures of these cover types and did not use such information in their interpretations; therefore 
contributing to the lower properties of pinyon-juniper designated for the veritioation data and tiie 
confusion with shrubs and grass classes (.see Tables 3 through 6), 

Several other deficiencies were noticed, First, the ground cover densities for low density and 
high density sagebrush classes as defined by training fields, both contained I0*r sagebrush. However, 
photointerpretation of low density .sagebrush stands even when occurring in mixtures of other ground 
cover stands were usually designated as low density sagebrush, in Land.sat classifications the.se areas 
were mixed class pixels, resulting in the frequent designation of a non-sagebrush class. Second, a 
training field for the mixed shrub class more closely patterned the mixed shrub-sagebrush chuss, 
which likelv contributed to the confusion between these classes. Third, the mixed sltrub type.s found 
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ill the irtiiniMn tlckls only represented a minority jiortion of the shrub types oceurrinit through tlte 
study area. Whereas in the puotointerpretation most all shrub types were designated as mi.ved shrub 
even though there was no representation in the training fields. Finally, a training field for the dry 
agrictilture class contained some irrigated agriculture areas. Most of these problems were eliminated 
when evaluating losver classification levels. Classes which had homogeneous cover and a representa- 
tive selection of training fields, .such as aspen, showed higher cla.ssification accuracies, 


SUMMARY AND CONCLUSIONS 


Uvaluatiitg classification accuracy requires much more than a simple sample design and analysis 
of results. Procedures must be develoi>ed whicli adequately samide unbia.seil verification fields, yield» 
ing narrow confidence limits. Furthermore, procedures must be implemented which analy/.e Landsat 
classitlcations rigorously with well designed analysis procedures, such as analysis within both 
heterogeneous and homogeneous areas and methods to reduce effects from spatial misregistration. 
For a complete analysis, results should be output in contingency tables with supplemental informa- 
tion on classification accuracies along with confidence limits, overall cla.ssification accuracy, omis- 
sion and commission errors, and information quantity assessment. 


Overall low classification accuracies caused us to evaluate the class characteristics used in the 
Landsat classifications and verification flelils. One particular problem was that the class descriptions 
were often inadequate for the diverse study area and the grountl cover wltich was represented by 
classes oftcti had changing class descriptions between the Landsat <:lassifieations and verification 
data. These errors resulted in overall lower classification accuracies and identifiable need forevalu- 
ation at lower classification levels. In many cases tlie restUts were merely a measure of tlie agreement 
between two data sets and not any valid measure of classification accuracy. 


The underlying factor for much of the problems of poor correspondence between the training 
laelds with the verification fields is attributed to the spatial comple.vity of the ground cover. Tltere 
are many combinations of ground cover re.sulting in an intlnite possible .set of proportions and pat- 
terns. Additionally, most comparisons require an accurate spatial registration for the Landsat clas.s- 
ifications and verification fields. In lieterogeneous areas a slight shift in the location of verification 
cells will change the ground cover mixtirre, frequently changing the cta.ss designation. 


As has been demonstrated, a simple evaluation of results does not usually provide a reneetion 
of the true classincation accuracy. More likely, no single procedure will work, what is needed is a 
rigorous e.xporimental design with procedures objectively pursued. 
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Table 2, Tlie table below depicts the grouping oi' classes roruilTerent levels of classification. 


Level of Classification 


24 Classes 


1 2 Classes 


4 Classes 


8 Classes 


Fir 

Cottonwood 

Aspen 

(Low Vigor) 

Aspen 

(High Vigor) 

Mi.xed Shrub 
Mixed Shrub- 

Sagebrush ....... 

Upland Sagebrush 
(Low Density) .. 
Upland Sagebrush 
(High Density) . . 
Bottondand Sagebrush 
(High Density) . . 
Pinyon-Junipcr 

( Low Density) . . 
Pinyon-Juniper 

(High Density) . , 
Grass (Dry) ....... 

Crass 

(Dry .Mead’ow) . . 
Grass 

(Dry Tundra) . . . 
Grass 

(Wet Tundra) . . . 
Agriculture (Dry) . . . 
Agriculture (Wet) . . . 

Barren Basalt 

Barren Rock 

W'ater Clear 

Water Turbid ...... 

Uncategori/.cd . . . . . 
Pinyon-Juniper 

Sagebrush 

Agriculture 

Unknown 


Coniferous ...... 


Deciduous ...... 


Shrub-Sagebrush, . 


>Sagcbrush 


Pinyon-Juniper 


Grass 


Agriculture 


Barren . 


Water 

Uncategori'/ed 
Pinyon-Juniper 
Sagebrush . . 
Agriculture 
Unknown , . 


Forest ....... 


Slirub , . , 


Pinyon-Juniper./ 


Grass and 
' Agriculture 


Barren 


Water 

Un categori/.ed . 
Pinyon-Juniper 
Sagebriislv . . , 
Agriculture 
Unknown . . , 


Forest 



Shrub and 
) Pinyon-Juniper 


Grass and 
Agriculture 


Barren 

Water 

Uncategori'/ed 

Pinyon-Juniper 

Sagebrush 

Agriculture 

Unknown 


Table 3. A class confusion table witli supplemental information showing results for the total sampled area. Results are from 
a pixel by pixel comparison of verification data with Landsat classifications at the 12 class level. 
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Tabic 4. A class conrusion table witli supplemental infonnation showing results for the total sampled area. Results are from 

designating classified 9-pixel cells single classes prior to comparisons between verification data and L:iiidsat classifications 
at the 1 2 class level. 
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Txiblc 5. A class confusion table with supplemental information showing results for the homogeneous only cells within the total 
sximpJcd area. Results arc from a pixel by pixel comparison of verification data with Landsat classifications at the 12 
pixel leveL 
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Table 6. A class confusion tabic with supplemental infonnaiion showing results for the homogeneous only cells within the total 
sampled area. Results are from designating classified 9 pixel cells single classes prior to comparisons between verification 
data and Landsat classifications at the 12 class level. 


"WaSRBl 





20 


- ' ” ■ jt y 






* iJounds are provided for the 0.05 confidence IcveL 




Tilble 7. The chanjte in overall iiereontagc agreemeiu after eorrecting misregistered 
Lanilsat elassirications lo corrcsponil to quadrangle maps. 


Level qT C lassifications 


7‘;‘ Quadrangle Map 

Classes 24 

Classes 1 2 

Classes 0 

('lasses 8 

Big Beaver Reser\\)ir 

+4.4G 

+ 10.0'?' 

+1 1.0'.’ 

+5.0'? 

Hamilton 

+4.2'? 

+7.0'? 

+7.b'7 

+5.1'? 

Hayden 

+ 1.7ft 

* 

O 

o 

+0.2'? 

Jessup Gulcli 

-0.2'7 


+2.0'? 

+1.5'? 

Sagebrush Hill 

+0,5'; 

* 

♦ 


Yankee Gulch 

+ 1.4'? 


+5.8'? 

+4.6'? 






Arc the spectral and spatial 
characteristics of a pixel in 
correspondence with the 
training field eharacteristics? 


Are the class designations 
for the verification data in 
agreeitient with the 
Landsat data? 


Yes 



No 


9 


7 



Tnblo 9. Significance levels arc provided for which the null hypothesis may be rejected. 


Class Name 

P(X>Xt) 

24 Class Level 

1 2 Class Level 

Fir 

0,01 

0.02 

Low Vigor .\spcn 

0.33 

0.50 

High Vigor Aspen 

0.10 

0.09 

Mi.'ccd Shrub 

0.01 

0.01 

Mixed Shriib-Sagcbaish 

0.01 

0.01 

Low Density Sagebrush 

0.22 

0.1 5 

High Density Upland Sagebrush 

0.27 

0.16 

High Density Lowland Sagebrush 


0.58 

Low Density Pinyon-Juniper 


0.27 

HipJi Density Pinyon-Juniper 

0.08 

0.01 

Dry Grassland 

0.03 

0.07 

Dry Meadow 

>« 

lit 

Wft Tundra 

>|! 

lit 

Dryland Agriculture 

0.28 

0.28 

Irrigated Agriculture 

0.51 

0.51 




*tnsutTiciein data. 
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Table 10. Fir class evaluation two-way contingonoy tables. 



Are the spectral and spatial characteristics 
of a pixel in corre.spondence with the 
training field characteristics? 


24 Class 
Level 

9 Class 
Level 


Yes No 

Yes No 

Are the pixel designations 

Yes 11 0 

13 2 

tor iiic vciuicaijon cnua 
in agreement with the 
Landsat data? 

No 5 5 

3 2 

Table 1 1 . Mixed shaib-sngebrush class evaluation two-way contingency table. 


Are the spectral and spatial characteristics 
of a pixel in correspondence with the 
training field characteristics? 


24 Class 
Level 

9 Class 
Level 


Yes No 

Yes No 

Are the pixel designations 

Yes 7 5 

12 2 

for the verification data in 
agreement with the 
Landsat data? 

No 1 1 40 

9 39 
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igurc 3. Photointerpretation F'orni. 









1 

I 


Table 9. Sifnificance leveli are provided for which the null hypothesis may be rejected. 

P(X>Xt) 


Clau Name 

24 Class Level 

1 2 Clau Level | 

Fir 

0.01 

O 

b 

Lcm> Vigor Aspen 

0.33 

0.50 

High Vigor Aspen 

0.10 

0.09 

Mixed Shrub 

0.01 

0.01 

Mixed Shrub>Sagebrush 

0.0 1 

0.01 

Low Density Sagebrush 

0.22 

0.15 

High Density Upland Sagebrush 

0.27 

0.16 

High Density Lowbnd Sagebrush 

* 

0.58 

Low Density Pinyon*Juniper 

* 

0.27 

High Density Pinyon-Juniper 

0.08 

0.01 

Dry Grassland 

0.03 

0.07 

Dry Meadow 

« 

« 

Wet Tundra 

* 

* 

Dryland Agriculture 

0.28 

0.28 

Irrigated Agriculture 

0.51 

0.51 

^Insufficient data. 
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Table 10. Fir clan evaluation two«way contingency tables. 


Are the spectral and spatial characteristics 
of a pixel in correspondence with the 
training field characteristics? 


Are the pixel designations 
for the verification data 
in agreement with the 
Landsat data? 


24 Class 

Level 


9 Class 

Level 



Yes 




Yes 

II 

0 

13 

2 

No 

5 

5 

3 

2 


Table 1 1. Mixed shrub«sagebrush class evaluation two-way contingency table. 



Are the spectral and spatial characteristics 

of a pixel in correspondence with the 

training field characteristics? 

24 Class 

9 Class 

Level 

Level 

Yes No 

Yes No 

Are the pixel designations Yes 7 5 

12 2 

for the verification data in 


agreement with the No II 40 

9 39 

Landsat data? 



















10,10 

□ 

35,95 

□ 


60,180 

□ 

100,40 

□ 

120,100 

□ 

150,40 

□ 

130,150 

150 , 120 ^^ 

□ 


. Example of Randomly Located 10-Acre Cells - VeriHcation Form. 
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VERIFICATION FORM quad ImBHy 

CELL 63.86 


! 



Fii^re 3. Photointerpivtation Fom. 






Fi^re 6, Mixed shrub-sagebrush training fieMs(thiee of the four 
training FeMs selected), occurring in the Jessup Gulch quadrangle. 





Figure 7. Verification Cell in the Sagebrush Hill quadrangle, having barren 
mixed shrub and uncategorized classes. An example of barren clau (pixel 
2) and an uncategorized class (pixel 5) designated as low density sagebrush 

for the verification data. 


Figure 8. High density pinyon-juniper training field, occurring in the 

Sagebrush Hill quadrangle. 



